home *** CD-ROM | disk | FTP | other *** search
- Optical character recognition system InterRec
- Version 1.0
-
- User's Manual
-
- 1. Purpose of the InterRec system
- The InterRec system of optical character recognition is
- intended for the recognition of texts entered into a personal
- computer with the aid of any kind of equipment for the input
- of optical information.
- The source material for InterRec's work is a graphics
- file of the text aquired with the aid of a device for the
- input of optical information and stored on a hard disk or a
- diskette in PCX format (two colors).
- The result of the system's work is a file containing the
- ASCII representation of the source graphics image of the text.
-
- 2. General information
- InterRec is an omnifont system. This means that it does
- not need any kind of setup or teaching for the recognition of
- a specific font.
- The text being processed must me written in a graphics
- file with a resolution not less that 200 pixels per inch. A
- resolution of 300 dots per inch is recommended, which
- increases the accuracy of recognition (with an insignificant
- increase in processing time).
- The character height must be within the following limits:
- for a resolution of 200 pixels per inch: 6 to 50 points
- (2.2 to 19 mm);
- for a resolution of 300 pixels per inch: 4 to 30 points
- (1.5 to 11 mm).
- The system has modes of operation that are
- user-selectable: ENGLISH and RUSSIAN for the recognition of
- texts in the English and Russian languages respectively.
- When working in the ENGLISH mode InterRec recognizes the
- capital and small letters of the English alphabet, numbers and
- the following symbols: ! ? . , : ; " / $ % & * ( ) - + = in
- vertical and italics, in normal and wide lines.
- When working in the RUSSIAN mode InterRec recognizes the
- capital and small letters of the Russian alphabet, numbers and
- the following symbols: ! ? . , : ; " / % * ( ) - + = in
- vertical, in normal and wide lines.
- In both modes InterRec recognizes letters, numbers and
- the above listed special symbols printed in proportional fonts
- on a laser printer, on a dot matrix printer (in
- near-letter-quality mode), on a typewriter or on a typesetting
- machine.
- To guarantee that the system works correctly, there must
- not be any characters other than those listed above, or any
- underlines. The presence of any tables or such graphic
- elements such as drawings, sketches or graphs may lead to the
- rejection of the entire page for character recognition.
- The accuracy of recognition depends on the quality of the
- graphics copy of the text and with high-quality of the source
- material attains 99%. The accuracy of recognition decreases
- with the presence of a large number of joined characters, and
- also with the presence of breaks in the representation of
- characters.
- The time for recognizing one page (2000 characters) does
- not exceed 1.5 minutes (with a resolution of 300 pixels per
- inch and using an IBM PC AT with a clock speed of 12 Mhz).
- The requirements for a personal computer are any machine
- that is compatible with an IBM PC XT/AT with not less than 512
- Kbytes of real memory and an EGA or VGA video adapter. A mouse
- makes the work with the system more comfortable, though it is
- not a required device. The operating system must be MS-DOS
- version 3.3 or higher.
-
- 3. The complete set
- A 5.25" DS/DD diskette containing:
- the program interrec.exe (intrdemo.exe for demo version)
- and the files needed to work with it: eng.trr and
- rus.trr; the instruction files for the user in English
- (readme.eng) and in Russian (readme.rus); files that
- let you read the files containing recognized Russian
- texts on the screen (egarfont.com, vgarfont.com,
- egarfont.fnt, vgarfont.fnt).
-
- The demo version of InterRec system does not write the
- recognized text into the file. Therefore you do not need the
- files egarfont.com, vgarfont.com, egarfont.fnt, vgarfont.fnt.
-
- 4. How to work with InterRec
- The EXE file used to load and start the program is called
- interrec.exe.
- After startup, a window containing some brief information
- about the system is displayed on the screen. To enter the main
- menu you must press any key on the keyboard or any mouse
- button.
- The main menu displays the system control functions which
- are divided into four groups corresponding to the following
- items of the main menu: File, Screen, Language, Start. To
- invoke a function of the main menu you should either
- (a) place the highlighted bar on the selected function
- using the left and right arrow keys, and press the
- Enter key;
- (b) move the mouse cursor to the necessary item and press
- twice the left button of the mouse (after the first
- click the highlighted bar will move to the necessary
- item).
- Submenu items, falling out of the main menu, are selected
- in the same way but using the up and down arrow keys. To
- return from submenu items to the main menu you should either:
- (a) press Esc;
- (b) press the right button of the mouse if its cursor is
- within the submenu, or press the left button if the
- cursor is outside this area.
-
- Now, let us discuss one by one the functions of the main
- menu.
-
- 4.1. File
- The purpose of this function is to select both a name for
- the file which stores the graphic image of the text being
- recognized, and a name of the file which will store the ASCII
- representation of the text. Here the exit control is also
- located. Accordingly, the submenu has three items: Load.PCX,
- Load.TXT, Exit.
-
- 4.1.1. Load.PCX.
- This item is used to select the name of the file
- containing graphics image of the text being recognized. This
- file must have the .PCX extension.
- When you select this option, a dialog box appears in the
- center on the screen in which you can see: an input line; a
- scroll window, where the file to be recognized is searched;
- and buttons labeled Ok and Cancel.
- If using a keyboard, you must continuously press the Tab
- key to transfer control between the input line, Ok and Cancel
- buttons.
- While entering the system, the input line displays the
- name of the directory where InterRec is located, and a .PCX
- searching mask.
- The input line has 7 lines. A line designated as .. \
- lets you call up the directory of the higher level. The
- remaining lines contain filenames of the current directory
- corresponding to the mask , i. e. filenames with . PCX
- extension, and also subdirectory names. If these filenames and
- subdirectory names are rather numerous and cannot be displayed
- simultaneously in the given lines, use the scrolling function
- to view them all. You should either press the down arrow key
- or move the mouse cursor to the down arrow of the scroll bar
- and press the left button of the mouse.
- Reverse scrolling is performed either with the up arrow
- key or by moving the mouse cursor to the up arrow of the
- scroll bar and pressing the left button of the mouse.
- You may also enter the directory you need (and change the
- current drive if necessary) by direct input of the pathname in
- the input line. To open this line for writing down, you can
- use the following ways:
- (a) if using a keyboard, press the Tab key until a cursor
- in the form of an underline appears in this line;
- (b) if using a mouse, move the mouse cursor to this line
- and press the left button of the mouse.
- The underline cursor points to the position in the line
- where you will start typing. You can move the cursor along the
- line using the left and right arrow keys. You can type both in
- replace and insert modes. The modes are switched with the
- Insert key. Finally, you must press the Enter key.
- There are two ways to select a filename for recognition:
- (a) move the highlighted bar using the up and down arrow
- keys and press the Enter key;
- (b) move the mouse cursor to the appropriate item and
- press the left button of the mouse twice (after the
- first click the highlighted bar will move to the
- appropriate item).
- The selected filename will be displayed on the input
- line. To proceed, you have to confirm the correctness of your
- choice. This is performed in the following ways:
- (a) if using a keyboard, activate the Ok button with the
- aid of the Tab key, as stated above (doing this, the
- letters "Ok" will change to red), and finally press
- the Enter key;
- (b) if using a mouse, move its cursor to the Ok button
- and press the left button of the mouse.
-
- After this the dialog box will disappear, and the
- following record in the upper line of the screen will appear:
- Recognition from full pathname and name of the file
- selected for recognition to full pathname and filename for
- storing ASCII code of the recognized text.
- The system itself prompts you to save the recognized text
- in a file in the same directory as the original file, and
- which retains the name of the original but with a new
- extension: .TXT. If the full pathnames are too long to be
- contained within one line, only the head parts of the
- pathnames (that is, the name of the current drive) as well as
- the tail parts of it (that is, the filenames) are displayed.
- In addition, in the window in the lower part of the
- screen where there is a scroll bar, the image of the graphics
- file selected for recognition (or its fragment) will appear.
- Shading in horizontal and/or vertical scales indicates
- that the graphics file is too large to be fully displayed in
- the window. Nevertheless, the system lets you view the
- remaining parts of the graphics file. The displayed area is
- moved along the file in the following ways:
- (a) if using a keyboard, press the left, right, up and
- down keys;
- (b) if using a mouse, move the pointer to the desired
- scroll bar arrows and pressing the left button.
- To switch control from the mouse to the keyboard and vice
- versa you should use the Tab key.
- Unshaded areas on the scroll bar show the location of the
- displayed fragment in relation to the whole page.
- If you are satisfied with the name and location of the
- ASCII file which have been proposed to you by the system, your
- work with the File item of main menu is completed. If the
- choice proposed to you by the system does not suit you, you
- should select the Load.TXT second option.
- Job in the dialog box can be cancelled prior to selecting
- a file for recognition. It can be done in the following ways:
- (a) if using a keyboard, activate the Cancel key using
- the Tab key until the word "Cancel" turns red,
- and then press Enter;
- (b) if using a mouse, move its arrow to the Cancel button
- and press the left button of the mouse.
-
- 4.1.2. Load.TXT
- This item is used to select the filename which will
- contain the recognized text.
- The selection of the appropriate file for the ASCII
- representation is done following the same sequence of
- operations as that for the graphics file. Extension .TXT
- should be used instead as a search mask , and the current
- directory must be the one where the file to be recognized was
- selected. The mask can be changed by inputting the required
- extension in the input line as in 4.1.1. After you have
- approved the selection, the pathname and filename will be
- displayed in the upper line of the screen.
- ASCII representation can be stored in a newly opened
- file. To do so you must type its name in the input line, press
- the Enter key and confirm your choice (Ok) as stated above.
- The pathname and the filename will be displayed in the upper
- line of the screen.
- After selecting the files for recognition and saving the
- ASCII representation, you should check the correctness of your
- choice. If you are satisfied, your work with the File function
- is completed; otherwise, you need to return to the appropriate
- function (Load. PCX or Load. TXT) and make a new selection
- following the above sequence of operations.
-
- 4.1.3. Exit
- Entering this item results in exiting from the InterRec
- system.
-
- 4.2. Screen
- This item is used to select the form in which information
- is displayed on the screen in the recognition process. Thus
- the submenu has three items: Graphics, All, ASCII.
-
- 4.2.1. Graphics
- When item Graphics is selected, the window for viewing
- graphics file increases in size. In this enlarged window a
- graphics file can be viewed in the same way as in 4.1.1.
- During recognition process the graphics file will be scrolled
- in the window. The recognition results will not be displayed.
-
- 4.2.2. All
- With choosing this item the windows do not change their
- positions on the screen. In the recognition process the lower
- window displays graphics file being scrolled, and the upper -
- the resulting text file.
- This mode is set by default if the Screen item of the
- main menu is not invoked.
-
- 4.2.3. ASCII
- Upon entering this item the whole screen will be used for
- a large window for demonstration of recognition results, i. e.
- the resulting ASCII file. This mode is faster than the
- previous one.
-
- 4.3. Language
- This item lets you switch operating modes of the
- recognition program with regards to the language of the source
- text. The selected mode is marked with a tick. English is the
- default language.
- The text editor must have an appropriate screen driver to
- let you read the files containing the recognized Russian text.
- If your text editor has no such driver, use the enclosed
- drivers that come with the InterRec system. To use them, you
- must start egarfont.com (with an EGA video adapter) or
- vgarfont.com (with a VGA video adapter) before you work with
- the editor. The character coding used by the system is given
- in Appendix.
-
- 4.4. Start
- This item of the main menu has no submenu and is used to
- start the recognition program.
- If a file selected for ASCII representation already
- exists, the following message will appear on the screen:
-
- ASCII file already exists
- New Add Cancel
-
- While using a keyboard, the required item is selected
- with the left and right arrow keys followed by pressing the
- Enter key. The selected item changes to red.
- If you chose the New option, the previous contents of the
- text file will be lost. If the Add item is selected, the
- recognition result is added to the contents of the file. This
- mode of operation can be effectively used if you want to
- combine into a single file the results of processing multiple
- sheets. As soon as you select an item, recognition will begin.
- Information will be displayed on the screen as it was
- described in 4.2. The unshaded area in the vertical scroll bar
- corresponds to the page fragment being recognized.
- Recognition will not start if the Cancel item was chosen.
- Program execution can be cancelled by pressing Esc.
- Upon the program termination a message will appear on the
- screen: Successful recognition.
-
- 5. Possible User Errors
-
- The system provides you with several of messages which
- indicate possible incorrect actions of the user.
-
- 5.1. File can't be opened
- This message appears when a non-existing file is selected
- for recognition.
-
- 5.2. Invalid drive or directory
- The message indicates that while selecting the file, the
- user has typed wrong name of the drive or directory.
-
- 5.3. This disk can't be set
- The message indicates that, when trying to open a file,
- the user has typed the name of a non-existing drive.
-
- 5.4. Data reading error
- This message appears when an error occurs in reading from
- the disk.
-
- 5.5. Invalid PCX format
- The message indicates that the file selected for
- recognition does not meet the requirements of PCX format,
- version 2.8 and lower. In this case you need to form the file
- for recognition anew.
-
- 5.6. Not compatible format
- The message testifies to the fact, that a file with color
- or gray-scale image, not supported by current system version,
- was selected for recognition.
-
- 5.7. Graphic file not loaded
- This message appears at the attempt to start recognition
- when the graphics file is not yet selected.
-
- 5.8. File .TRR can't open
- This message appears during recognition when there are no
- rus.trr or eng.trr files in the current directory.
-
- 5.9. Insufficient memory
- The message warns you that the RAM space for the program
- is not enough. In this event, you need to delete, if possible,
- some resident routines the system can do without.
-
- 5.10. Recognition is completed
- The message is possible in the following cases: the file
- contains some non-text elements of great vertical size; the
- size of the character is too great; two or more adjacent lines
- have merged or overlapped; the lines of the inputted image and
- the higher (lower) edge of the sheet of paper are far from
- being parallel.
-
- In addition to the above mentioned messages, some other
- error messages may appear on the screen (however, this is
- highly improbable). In such a case, you need to consult the
- designers of the system or their representative in your
- country.
-
-
- Appendix
-
- Alternative ASCII code used by InterRec system
-
- ┌───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┐
- │ Ç │ ü │ é │ â │ ä │ à │ å │ ç │ ê │ ë │ è │ ï │ î │ ì │ Ä │ Å │
- │128│129│130│131│132│133│134│135│136│137│138│139│140│141│142│143│
- ├───┼───┼───┼───┼───┼───┼───┼───┼───┼───┼───┼───┼───┼───┼───┼───┤
- │ É │ æ │ Æ │ ô │ ö │ ò │ û │ ù │ ÿ │ Ö │ Ü │ ¢ │ £ │ ¥ │ ₧ │ ƒ │
- │144│145│146│147│148│149│150│151│152│153│154│155│156│157│158│159│
- ├───┼───┼───┼───┼───┼───┼───┼───┼───┼───┼───┼───┼───┼───┼───┼───┤
- │ á │ í │ ó │ ú │ ñ │ Ñ │ ª │ º │ ¿ │ ⌐ │ ¬ │ ½ │ ¼ │ ¡ │ « │ » │
- │160│161│162│163│164│165│166│167│168│169│170│171│172│173│174│175│
- ├───┼───┼───┼───┼───┼───┼───┼───┼───┼───┼───┼───┼───┼───┼───┼───┤
- │ α │ ß │ Γ │ π │ Σ │ σ │ µ │ τ │ Φ │ Θ │ Ω │ δ │ ∞ │ φ │ ε │ ∩ │
- │224│225│226│227│228│229│230│231│232│233│234│235│236│237│238│239│
- ├───┼───┼───┼───┼───┼───┼───┼───┼───┼───┼───┼───┼───┼───┼───┼───┤
- │ ≡ │ ± │ ≥ │ ≤ │ ⌠ │ ⌡ │ ÷ │ ≈ │ ° │ ∙ │ · │ √ │ ⁿ │ ² │ ■ │ │
- │240│241│242│243│244│245│246│247│248│249│250│251│252│253│254│255│
- └───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┘